Adversarial machine learning has been both a major concern and a hot topic recently, especially with the ubiquitous use of deep neural networks in the current landscape. Adversarial attacks and defenses are usually likened to a cat-and-mouse game in which defenders and attackers evolve over the time. On one hand, the goal is to develop strong and robust deep networks that are resistant to malicious actors. On the other hand, in order to achieve that, we need to devise even stronger adversarial attacks to challenge these defense models. Most of existing attacks employs a single $\ell_p$ distance (commonly, $p\in\{1,2,\infty\}$) to define the concept of closeness and performs steepest gradient ascent w.r.t. this $p$-norm to update all pixels in an adversarial example in the same way. These $\ell_p$ attacks each has its own pros and cons; and there is no single attack that can successfully break through defense models that are robust against multiple $\ell_p$ norms simultaneously. Motivated by these observations, we come up with a natural approach: combining various $\ell_p$ gradient projections on a pixel level to achieve a joint adversarial perturbation. Specifically, we learn how to perturb each pixel to maximize the attack performance, while maintaining the overall visual imperceptibility of adversarial examples. Finally, through various experiments with standardized benchmarks, we show that our method outperforms most current strong attacks across state-of-the-art defense mechanisms, while retaining its ability to remain clean visually.
translated by 谷歌翻译
Pareto Front Learning (PFL) was recently introduced as an effective approach to obtain a mapping function from a given trade-off vector to a solution on the Pareto front, which solves the multi-objective optimization (MOO) problem. Due to the inherent trade-off between conflicting objectives, PFL offers a flexible approach in many scenarios in which the decision makers can not specify the preference of one Pareto solution over another, and must switch between them depending on the situation. However, existing PFL methods ignore the relationship between the solutions during the optimization process, which hinders the quality of the obtained front. To overcome this issue, we propose a novel PFL framework namely \ourmodel, which employs a hypernetwork to generate multiple solutions from a set of diverse trade-off preferences and enhance the quality of the Pareto front by maximizing the Hypervolume indicator defined by these solutions. The experimental results on several MOO machine learning tasks show that the proposed framework significantly outperforms the baselines in producing the trade-off Pareto front.
translated by 谷歌翻译
The introduction of high-quality image generation models, particularly the StyleGAN family, provides a powerful tool to synthesize and manipulate images. However, existing models are built upon high-quality (HQ) data as desired outputs, making them unfit for in-the-wild low-quality (LQ) images, which are common inputs for manipulation. In this work, we bridge this gap by proposing a novel GAN structure that allows for generating images with controllable quality. The network can synthesize various image degradation and restore the sharp image via a quality control code. Our proposed QC-StyleGAN can directly edit LQ images without altering their quality by applying GAN inversion and manipulation techniques. It also provides for free an image restoration solution that can handle various degradations, including noise, blur, compression artifacts, and their mixtures. Finally, we demonstrate numerous other applications such as image degradation synthesis, transfer, and interpolation.
translated by 谷歌翻译
当前的3D分割方法很大程度上依赖于大规模的点状数据集,众所周知,这些数据集众所周知。很少有尝试规避需要每点注释的需求。在这项工作中,我们研究了弱监督的3D语义实例分割。关键的想法是利用3D边界框标签,更容易,更快地注释。确实,我们表明只有仅使用边界框标签训练密集的分割模型。在我们方法的核心上,\ name {}是一个深层模型,灵感来自经典的霍夫投票,直接投票赞成边界框参数,并且是专门针对边界盒票的专门定制的群集方法。这超出了常用的中心票,这不会完全利用边界框注释。在扫描仪测试中,我们弱监督的模型在其他弱监督的方法中获得了领先的性能(+18 MAP@50)。值得注意的是,它还达到了当前完全监督模型的50分数的地图的97%。为了进一步说明我们的工作的实用性,我们在最近发布的Arkitscenes数据集中训练Box2mask,该数据集仅使用3D边界框注释,并首次显示引人注目的3D实例细分掩码。
translated by 谷歌翻译
由于GaN潜在空间的勘探和利用,近年来,现实世界的图像操纵实现了奇妙的进展。 GaN反演是该管道的第一步,旨在忠实地将真实图像映射到潜在代码。不幸的是,大多数现有的GaN反演方法都无法满足下面列出的三个要求中的至少一个:重建质量,可编辑性和快速推断。我们在本研究中提出了一种新的两阶段策略,同时适合所有要求。在第一阶段,我们训练编码器将输入图像映射到StyleGan2 $ \ Mathcal {W} $ - 空间,这被证明具有出色的可编辑性,但重建质量较低。在第二阶段,我们通过利用一系列HyperNetWorks来补充初始阶段的重建能力以在反转期间恢复缺失的信息。这两个步骤互相补充,由于Hypernetwork分支和由于$ \ Mathcal {W} $ - 空间中的反转,因此由于HyperNetwork分支和优异的可编辑性而相互作用。我们的方法完全是基于编码器的,导致极快的推断。关于两个具有挑战性的数据集的广泛实验证明了我们方法的优越性。
translated by 谷歌翻译
Diffusion models are rising as a powerful solution for high-fidelity image generation, which exceeds GANs in quality in many circumstances. However, their slow training and inference speed is a huge bottleneck, blocking them from being used in real-time applications. A recent DiffusionGAN method significantly decreases the models' running time by reducing the number of sampling steps from thousands to several, but their speeds still largely lag behind the GAN counterparts. This paper aims to reduce the speed gap by proposing a novel wavelet-based diffusion structure. We extract low-and-high frequency components from both image and feature levels via wavelet decomposition and adaptively handle these components for faster processing while maintaining good generation quality. Furthermore, we propose to use a reconstruction term, which effectively boosts the model training convergence. Experimental results on CelebA-HQ, CIFAR-10, LSUN-Church, and STL-10 datasets prove our solution is a stepping-stone to offering real-time and high-fidelity diffusion models. Our code and pre-trained checkpoints will be available at \url{https://github.com/VinAIResearch/WaveDiff.git}.
translated by 谷歌翻译
Recent studies on adversarial images have shown that they tend to leave the underlying low-dimensional data manifold, making them significantly more challenging for current models to make correct predictions. This so-called off-manifold conjecture has inspired a novel line of defenses against adversarial attacks on images. In this study, we find a similar phenomenon occurs in the contextualized embedding space induced by pretrained language models, in which adversarial texts tend to have their embeddings diverge from the manifold of natural ones. Based on this finding, we propose Textual Manifold-based Defense (TMD), a defense mechanism that projects text embeddings onto an approximated embedding manifold before classification. It reduces the complexity of potential adversarial examples, which ultimately enhances the robustness of the protected model. Through extensive experiments, our method consistently and significantly outperforms previous defenses under various attack settings without trading off clean accuracy. To the best of our knowledge, this is the first NLP defense that leverages the manifold structure against adversarial attacks. Our code is available at \url{https://github.com/dangne/tmd}.
translated by 谷歌翻译
成功的人工智能系统通常需要大量标记的数据来从文档图像中提取信息。在本文中,我们研究了改善人工智能系统在理解文档图像中的性能的问题,尤其是在培训数据受到限制的情况下。我们通过使用加强学习提出一种新颖的填充方法来解决问题。我们的方法将信息提取模型视为策略网络,并使用策略梯度培训来更新模型,以最大程度地提高补充传统跨凝结损失的综合奖励功能。我们使用标签和专家反馈在四个数据集上进行的实验表明,我们的填充机制始终提高最先进的信息提取器的性能,尤其是在小型培训数据制度中。
translated by 谷歌翻译
尽管在过去的几年中取得了重大进展,但歧义仍然是面部表情识别(FER)的关键挑战。它可能导致嘈杂和不一致的注释,这阻碍了现实世界中深度学习模型的性能。在本文中,我们提出了一种新的不确定性标签分布学习方法,以提高深层模型的鲁棒性,以防止不确定性和歧义。我们利用价值空间中的邻里信息来适应培训训练样本的情绪分布。我们还考虑提供的标签将其纳入标签分布时的不确定性。我们的方法可以轻松地集成到深层网络中,以获得更多的培训监督并提高识别准确性。在各种嘈杂和模棱两可的环境下,在几个数据集上进行了密集的实验表明,我们的方法取得了竞争成果,并且超出了最新的最新方法。我们的代码和模型可在https://github.com/minhnhatvt/label-distribution-learning-fer-tf上找到。
translated by 谷歌翻译
在过去的两年中,从2020年到2021年,Covid-19在包括越南在内的许多国家 /地区都破坏了预防疾病措施,并对人类生活和社会社区的各个方面产生了负面影响。此外,社区中的误导性信息和有关大流行的虚假新闻也是严重的情况。因此,我们提出了第一个基于越南社区的问题答复数据集,用于开发COVID-19的问题答案系统,称为UIT-VICOV19QA。该数据集包括从可信赖的医疗来源收集的4,500对提问,至少有一个答案,每个问题最多有四个独特的解释答案。除数据集外,我们还建立了各种深度学习模型作为基线,以评估数据集的质量,并通过BLEU,Meteor和Rouge-l等常用指标来进一步研究基准结果,以进行进一步的研究。我们还说明了对这些模型进行多个解释答案的积极影响,尤其是在变压器上 - 研究领域的主要结构。
translated by 谷歌翻译